8385834: Tighten ListFormat.getInstance(String[]) behavior for invalid placeholders#31377
Conversation
|
👋 Welcome back naoto! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
Webrevs
|
justin-curtis-lu
left a comment
There was a problem hiding this comment.
Duplicate placeholder checking looks good to me. New test cases look good and exhaustive. Some other comments below.
| var patterns = new String[]{ | ||
| "{0}, {1}", | ||
| "{0}, {1}", | ||
| "{0}, and {1}", | ||
| "{0} and {1}", | ||
| "{0} {1} {2}" | ||
| }; | ||
| patterns[index] = "{0}".repeat(100_000); | ||
| patterns[index] = invalidPattern; |
There was a problem hiding this comment.
I understand that the current implementation rejects this input immediately because it contains duplicate placeholders, but I think this 100_000 length case is still worth preserving as a regression test. I might retain it as a simple standalone test that preserves the long string input (and probably only needs to be checked against one pattern, not all). I say this because I don't think the test should be relying on an assumption of the current implementation.
| var positions = new int[3]; | ||
| for (int i = 0; i < positions.length; i++) { | ||
| positions[i] = pattern.indexOf("{" + i + "}"); | ||
| var ph = "{" + i + "}"; |
There was a problem hiding this comment.
ListFormat uses MessageFormat, which means that any patterns taken by the former get passed to the latter.
What would we expect to happen if {3} was passed as a pattern. Do we expect that to be printed out literally? Do we expect to reject it because it takes the form of a placeholder but exceeds the 0-2 value range?
In the current form, depending on the number of elements, it could either be printed out literally, or it could be interpreted as a MessageFormat pattern. For example,
String[] p = {
"{3} {0}, {1}",
"{0}, {1}",
"{0}, and {1}",
"",
""
};
var lf = ListFormat.getInstance(p);
lf.format(List.of("a", "b", "c", "d")); // -> d a, b, c, and d
lf.format(List.of("a", "b", "c")); // -> {3} a, b, and c
I would not be opposed to rejecting any placeholders that are not in the form of only 0, 1, and 2. I am going to guess that no locales rely on {HARD_CODED_NUMBER} as being a part of their list patterns. What do you think?
There was a problem hiding this comment.
Good point. I think rejecting is fine. In fact, I made it stricter as any curly brace use other than 0/1/2 now throws an exception because MessageFormat has its own parsing, which prevents "{foo}" from working as a literal.
| * can be created with {@link #getInstance(String[])}. The String array to the | ||
| * method specifies the delimiting patterns for the start/middle/end portion of | ||
| * the formatted string, as well as optional specialized patterns for two or three | ||
| * method specifies the delimiting patterns for the {@code start}/{@code middle}/{@code end} |
There was a problem hiding this comment.
I think "The string array passed to the method" reads better.
| * If parsing of any pattern string for start, middle, end, two, or three fails, | ||
| * it throws an {@code IllegalArgumentException}. | ||
| * } | ||
| * If {@code two} or {@code three} pattern string is empty, it falls back to |
There was a problem hiding this comment.
Recommend: If the {@code two} ...
| * If parsing of any pattern string for {@code start}, {@code middle}, | ||
| * {@code end}, {@code two}, or {@code three} fails, including duplicate | ||
| * placeholders, "{2}" in patterns other than the {@code three} element | ||
| * pattern, or any use of "{" or "}" other than "{0}", "{1}", or "{2}", |
There was a problem hiding this comment.
We won't have support for literal braces which seems reasonable for these list patterns. There could always be an escaping mechanism for such support, but seems unnecessary for now unless there was a genuine request/demand.
| * @return validation result | ||
| */ | ||
| private static boolean validatePlaceholders(String pattern) { | ||
| for (int i = 0; i < pattern.length(); i++) { |
There was a problem hiding this comment.
Since validatePlaceholders already iterates through the entire pattern string to validate braces, I think we can just merge the logic of findPlaceholders and validatePlaceholders into this main loop to do validation and placeholder recording.
Instead of
var positions = new int[3];
for (int i = 0; i < positions.length; i++) {
var ph = "{" + i + "}";
positions[i] = pattern.indexOf(ph);
if (positions[i] >= 0) {
// Check for duplicate placeholders
if (pattern.indexOf(ph, positions[i] + PLACEHOLDER_LENGTH) >= 0) {
return null;
}
}
}
I think we can just initialize positions as new int[] {-1, -1, -1}; and if positions[index] != -1 we can fail. We can derive the correct index via pattern.charAt(i + 1) - '0' (given this code occurs within the ch == '{' block).
There was a problem hiding this comment.
Yes. That'd be simpler/performant. Fixed.
Made changes to the invalid placeholder handling in
ListFormat.getInstance(String[]), stemming from JDK-8385736.This rejects duplicate placeholders and
"{2}"outside thethreepattern, with spec and test updates. Also includes minor Javadoc formatting cleanups. The invalid-long-pattern test added with JDK-8385736 has been repurposed to cover the newly specified invalid-placeholder cases, because the repeated-placeholder input is now rejected immediately by duplicate-placeholder validation.Since this changes the behavior of the method, a corresponding CSR has been drafted.
Progress
Issues
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/31377/head:pull/31377$ git checkout pull/31377Update a local copy of the PR:
$ git checkout pull/31377$ git pull https://git.openjdk.org/jdk.git pull/31377/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 31377View PR using the GUI difftool:
$ git pr show -t 31377Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/31377.diff
Using Webrev
Link to Webrev Comment